Model Selection

RLHF Optimization

# RLHF Optimization

RM R1 DeepSeek Distilled Qwen 32B

RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning trajectories, providing interpretable evaluations.

Large Language Model

Transformers English

RM R1 Qwen2.5 Instruct 7B

RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, significantly improving accuracy and interpretability compared to traditional reward models.

Large Language Model

Transformers English

RM R1 Qwen2.5 Instruct 14B

RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, providing explainable assessments.

Large Language Model

Transformers English

RM R1 Qwen2.5 Instruct 32B

RM-R1 is a framework for reward modeling through reasoning trajectory generation, offering significant improvements in accuracy and interpretability compared to traditional methods

Large Language Model

Transformers English

Llama 3 OffsetBias RM 8B

A reward model trained on the OffsetBias dataset, offering enhanced robustness against biases in evaluation models

Large Language Model

Transformers English

Llama 3 8B SFR SFT R

A supervised fine-tuned model based on LLaMA-3-8B, developed by Salesforce, for the supervised fine-tuning phase in reinforcement learning from human feedback (RLHF) workflows.

Large Language Model

A 7-billion parameter medical large language model developed by John Snow Labs, optimized for the biomedical field

Large Language Model

Transformers English

AmberSafe is a safety fine-tuned instruction model based on LLM360/AmberChat, belonging to the LLM360 Pebble series, focusing on providing secure text generation capabilities.

Large Language Model

Transformers English

Xwin LM 13B V0.2

Xwin-LM is a large language model alignment technology developed based on Llama2, demonstrating outstanding performance in the AlpacaEval benchmark

Large Language Model

Xwin LM 7B V0.1

Xwin-LM is a large language model alignment solution based on Llama2, focusing on enhancing the model's alignment capabilities, including supervised fine-tuning and reward modeling techniques. The 7B version performs excellently in the AlpacaEval benchmark.

Large Language Model

Gpt2 Open Instruct V1 Anthropic Hh Rlhf

A dialogue model fine-tuned on the Anthropic/hh-rlhf dataset based on GPT2-open-instruct, excelling in responding to prompts in dialogue scenarios

Large Language Model

Transformers English

Reward Model Deberta V3 Large V2

This reward model is trained to predict which generated answer humans would prefer for a given question. Suitable for QA evaluation, RLHF reward scoring, and toxic answer detection.

Large Language Model

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase